Skip to content

Release v1.7.4 — external-wait formula + CPU:Elapsed adjustment#260

Merged
erikdarlingdata merged 1 commit intomainfrom
dev
Apr 22, 2026
Merged

Release v1.7.4 — external-wait formula + CPU:Elapsed adjustment#260
erikdarlingdata merged 1 commit intomainfrom
dev

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

Joe feedback #215 C6 + C7:

  • External/preemptive waits (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, PREEMPTIVE_*) now use the CPU-share formula so benefit % matches query impact instead of being near-zero.
  • CPU:Elapsed ratio subtracts external-wait time from CPU for a truer picture.

See PR #259.

Test plan

  • Formula math matches Joe's expected 48.3%
  • Serial and parallel reference plans sanity-check
  • Visual post-deploy
  • Verify MwX36LEHJS raw XML when received

🤖 Generated with Claude Code

…nt (#215 C6, C7) (#259)

C6 — external-wait benefit formula:
The existing parallel-wait formula uses per-thread wait = max(0, elapsed - cpu).
External / preemptive waits (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT,
PREEMPTIVE_*) are CPU-busy in kernel, so elapsed ≈ cpu for those threads and the
wait barely shows in that math. Joe's formula:

  wait_cpu_share = wait_ms / total_cpu_ms
  sum_max_cpu    = Σ max-thread-self-cpu across operators
  benefit_ms     = wait_cpu_share * sum_max_cpu

Verified against Joe's worked example on MwX36LEHJS:
  (26307 / 45075) * (11341 + 42) / 13748 = 48.3% ✓

Implementation:
- New IsExternalWait classifier (public so mapping can see it) covering
  MEMORY_ALLOCATION* and PREEMPTIVE_* prefixes.
- OperatorWaitProfile grows MaxThreadCpuMs — per-thread self-CPU
  (non-cumulative), taken from new PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs
  which mirrors GetOperatorOwnElapsedMs. Uses self-CPU so summing across
  operators in a serial plan doesn't double-count cumulative CPU, and
  sums to approximate critical-path CPU in parallel plans.
- ScoreWaitStats routes external waits to CalculateExternalWaitBenefit.
- Serial plans with external waits: sum-of-self-cpu ≈ statement cpu, so
  the formula collapses to wait/elapsed — same as before, no regression.

C7 — CPU:Elapsed numerator:
External waits inflate the CPU number without representing real query CPU,
so the CPU:Elapsed ratio (used in web runtime card, HTML export runtime
card, and Avalonia DOP efficiency) subtracts Σ external-wait ms from CPU
before dividing by elapsed.

- AnalysisResult.QueryTimeResult gains ExternalWaitMs, populated in
  ResultMapper by summing wait stats matching IsExternalWait.
- HtmlExporter.WriteRuntimeCard, Index.razor runtime card, and
  PlanViewerControl DOP-efficiency calc all use (cpu - externalWaitMs).

Version bump 1.7.3 -> 1.7.4.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 48870b0 into main Apr 22, 2026
3 checks passed
Copy link
Copy Markdown
Owner Author

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

v1.7.4 release: external/preemptive waits (MEMORY_ALLOCATION_*, PREEMPTIVE_*) now get CPU-share benefit scoring instead of the (elapsed - cpu) per-thread math that missed them, and CPU:Elapsed ratios subtract external-wait time from CPU. New QueryTimeResult.ExternalWaitMs field exposes the delta. Adds PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs to compute per-thread self-CPU for the new formula.

What's good

  • Targeted fix with a specific math rationale (wait_cpu_share * Σ max_thread_cpu) and doc-comments explaining the why.
  • CpuTimeMs - ExternalWaitMs is clamped with Math.Max(0, ...) at every call site, so a bad wait-stats parse can't produce a negative ratio.
  • ScoreWaitStats now collects operatorProfiles unconditionally (not just for parallel), which the new formula needs.

What needs attention

Base branch

PR targets main, not dev. Release PRs or not, that's contrary to the project's branching convention — confirm this is intentional before merging.

Cross-repo sync (PerformanceMonitor)

This touches the plan-analysis rule set (BenefitScorer.ScoreWaitStats, wait classification, external-wait formula) and the PlanAnalyzer API surface (GetOperatorMaxThreadOwnCpuMs). The equivalent change must also land in the PerformanceMonitor Dashboard and Lite copies, or their rule set will drift from Studio's on every external-wait plan. Track separately.

Tests

No changes under tests/PlanViewer.Core.Tests/. Logic added in BenefitScorer.CalculateExternalWaitBenefit, BenefitScorer.IsExternalWait, and PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs has no unit coverage. At minimum:

  • A parallel plan with a MEMORY_ALLOCATION_EXT wait and known per-thread CPU → asserts Joe's 48.3% (test plan references it).
  • A serial plan with PREEMPTIVE_* wait → asserts serial branch of GetOperatorMaxThreadOwnCpuMs.
  • IsExternalWait positive/negative cases, including the RESERVED_MEMORY_ALLOCATION_EXT substring-match variant.

The PR body's "Verify MwX36LEHJS raw XML when received" box is still unchecked; a captured fixture for that plan under tests/PlanViewer.Core.Tests/Plans/ would be the natural home.

Inline items

See line comments for: (1) triplicated external-wait summation across PlanViewerControl.axaml.cs:2845, ResultMapper.cs:114, and BenefitScorer.CalculateExternalWaitBenefit; (2) IsExternalWait Contains("MEMORY_ALLOCATION") being broader than the documented target and conflicting with ClassifyWaitType's "Memory" bucket; (3) OrderByDescending(...).First() picking a single Parallelism-exchange child nondeterministically in GetOperatorMaxThreadOwnCpuMs; (4) parentByThread overwrite on duplicate ThreadId; (5) fallback-to-known-wrong-ratio when stmtCpuMs <= 0; (6) Output → Services dependency in ResultMapper.

No Avalonia-specific issues (no AvaloniaEdit/ComboBox/:pointerover/brush changes). No contrast regressions.


Generated by Claude Code

Comment on lines +2845 to +2849
long externalWaitMs = 0;
foreach (var w in statement.WaitStats)
if (BenefitScorer.IsExternalWait(w.WaitType))
externalWaitMs += w.WaitTimeMs;
var effectiveCpu = Math.Max(0, statement.QueryTimeStats.CpuTimeMs - externalWaitMs);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third copy of the external-wait summation (also in ResultMapper.cs:114-119 and implicit in BenefitScorer.CalculateExternalWaitBenefit). Three call sites iterate WaitStats, call IsExternalWait, sum WaitTimeMs. Worth pulling into a single helper on PlanStatement (or BenefitScorer.SumExternalWaitMs(stmt)) so the definition of "external wait" has one home — especially if the rule ever expands beyond MEMORY_ALLOCATION / PREEMPTIVE_*.


Generated by Claude Code

Comment on lines +633 to +634
return wt.Contains("MEMORY_ALLOCATION")
|| wt.StartsWith("PREEMPTIVE_");
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contains("MEMORY_ALLOCATION") is broader than the doc-comment's stated target (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT). Any future wait type with that substring will silently be classified as external and get the CPU-share formula even if the worker isn't CPU-busy in kernel. Tightening to StartsWith("MEMORY_ALLOCATION") || StartsWith("RESERVED_MEMORY_ALLOCATION") — or an explicit allowlist — matches the documented intent.

Also note this diverges from ClassifyWaitType at line 550, which still puts MEMORY_ALLOCATION in the "Memory" category. Two classifiers of the same wait that disagree is a future foot-gun.


Generated by Claude Code

Comment on lines +609 to +610
if (stmtCpuMs <= 0 || stmtElapsedMs <= 0)
return (double)wait.WaitTimeMs / Math.Max(1, stmtElapsedMs) * 100;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early-return path when stmtCpuMs <= 0 falls back to wait_ms / elapsed * 100 — the same math that was already failing to surface these waits (per the method's own doc-comment). If the data is missing, returning 0 would be more honest than a known-wrong number.


Generated by Claude Code

{
var childNode = child;
if (child.PhysicalOp == "Parallelism" && child.Children.Count > 0)
childNode = child.Children.OrderByDescending(c => c.ActualCPUMs).First();
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrderByDescending(c => c.ActualCPUMs).First() picks a single Parallelism-exchange child and discards the other side(s), so any CPU those threads contributed is dropped from the per-thread subtraction. On ties, .First() is effectively arbitrary (enumeration order) which makes maxSelf non-deterministic for identical plan inputs.

Two questions worth confirming: (1) Is picking the max-CPU child actually the right semantic here, versus summing per-thread CPU across all exchange children keyed by ThreadId? (2) Does this mirror the existing elapsed-path helper GetPerThreadOwnElapsed — if not, the two "critical-path" calculations may diverge.


Generated by Claude Code

Comment on lines +1607 to +1609
var parentByThread = new Dictionary<int, long>();
foreach (var ts in node.PerThreadStats)
parentByThread[ts.ThreadId] = ts.ActualCPUMs;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parentByThread[ts.ThreadId] = ts.ActualCPUMs — if PerThreadStats ever contains two entries for the same ThreadId (e.g., thread reuse across iterations, or parser quirks on some plan shapes), the second silently overwrites the first. Worth a TryGetValue + sum like childSumByThread uses a few lines below, for symmetry and safety.


Generated by Claude Code

@@ -508,7 +507,15 @@ private static void ScoreWaitStats(PlanStatement stmt)
var category = ClassifyWaitType(wait.WaitType);
double benefitPct;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gate IsExternalWait(...) && operatorProfiles.Count > 0: for an external wait on a plan without a RootNode (estimated-only plan that somehow has wait stats, or parse failures), this falls through to the Parallelism branch test and then to !isParallel || operatorProfiles.Count == 0 → simple ratio. That's the pre-fix behavior, which the doc-comments say is wrong for these waits. Likely fine in practice, but the fallback chain for external waits in degraded data is worth a single explicit branch.


Generated by Claude Code

@@ -1,4 +1,5 @@
using PlanViewer.Core.Models;
using PlanViewer.Core.Services;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output layer now depends on PlanViewer.Core.Services to call BenefitScorer.IsExternalWait. Fine as long as you're OK with Output → Services, but the classification is really a model concern — a PlanStatement.ExternalWaitMs property (or a WaitStatInfo.IsExternal computed once during parse) would let both Output and the Avalonia control consume the value without reaching into Services.


Generated by Claude Code

Comment on lines +179 to +186

/// <summary>
/// Sum of external/preemptive wait time (MEMORY_ALLOCATION_*, PREEMPTIVE_*) —
/// these waits are CPU-busy in kernel and inflate CpuTimeMs vs real query CPU.
/// Subtract from CpuTimeMs for a truer CPU:Elapsed ratio.
/// </summary>
[JsonPropertyName("external_wait_ms")]
public long ExternalWaitMs { get; set; }
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New JSON field external_wait_ms is a public output contract — MCP clients, downstream consumers, and the Dashboard/Lite copies of PerformanceMonitor will start seeing this. Worth mentioning in release notes and confirming any consumer that computes its own CPU:Elapsed ratio (e.g., dashboards) gets the same treatment.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant